Skip to content

Java: Convert all collection and array steps from taint flow to value flow.#5751

Merged
aschackmull merged 15 commits intogithub:mainfrom
aschackmull:java/collection-flow
Jun 3, 2021
Merged

Java: Convert all collection and array steps from taint flow to value flow.#5751
aschackmull merged 15 commits intogithub:mainfrom
aschackmull:java/collection-flow

Conversation

@aschackmull
Copy link
Copy Markdown
Contributor

@aschackmull aschackmull commented Apr 22, 2021

This converts all collection and array flow steps to use precise value flow rather than taint flow. This means that read-steps must match up with preceding store-steps, and that the tracked object can be followed more precisely through collections and arrays, both in terms of its type and in terms of the flow being applicable in pure DataFlow::Configurations. This allows more flow for value-based data flow, which is relevant for both DataFlow::Configurations and the subpaths of TaintTracking::Configuration that are restricted to value flow (i.e. when tracking through fields), and more precision in taint tracking compared to simply marking a collection or array as tainted.

As an example, FP flow like the following is no longer reported:

String taintedString = taint;
Map<K, Object> map;
map.put(k, taintedString);
Object obj = map.get(..);
byte[] bytes = (byte[])obj;
sink(bytes);

Collection and array read steps are still also treated as taint steps to support a suitable interpretation of a "tainted collection" or "tainted array".
In addition, a temporary measure to provide a large degree of backwards compatibility with ad-hoc taint flows, a set of configuration-dependent store-as-taint steps are added to support sinks and steps that read from collections/arrays.

@github-actions github-actions Bot added the Java label Apr 22, 2021
"java.util.stream;SpinedBuffer;true;copyInto;(Object[],int);;Element of Argument[-1];ArrayElement of Argument[0];value",
"java.util.stream;SpinedBuffer<>$OfPrimitive;true;copyInto;(Object,int);;Element of Argument[-1];ArrayElement of Argument[0];value",
"java.util;List;false;copyOf;(Collection);;Element of Argument[0];Element of ReturnValue;value",
"java.util;List;false;of;(Object[]);;Element of Argument[0];Element of ReturnValue;value",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOI, what does the empty of in ;of; mean?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just the name of the varargs method List::of and not related to the DSL.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good.

Copy link
Copy Markdown
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed up to but excluding NavigableMap

sink.(DataFlow::ImplicitVarargsArray).getCall() = arg.getCall()
exists(Type elemType |
arrayReadStep(src, sink, elemType) and
not elemType instanceof PrimitiveType and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on why these exclusions are made here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because numbers usually don't carry taint. I.e. by default we'll say that an int[] can be tainted without implying that the individual elements are tainted, whereas when a MyObject[] is tainted then we'll take that to mean that all of the contained MyObjects are tainted as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sure, but I'm surprised to find this changing as part of this work? Didn't we previously permit taint to travel from x to x[0] even if x was an int[]?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could say that we're adding a taint step here rather than restricting one, since the general x to x[0] step is now considered an array read step that's expected to be matched by an array store step.

Comment on lines +99 to +100
"java.util;Map<>$Entry;true;setValue;;;MapValue of Argument[-1];ReturnValue;value",
"java.util;Map<>$Entry;true;setValue;;;Argument[0];MapValue of Argument[-1];value",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this pair work as expected? (the return is the old value)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when we're targeting an argument as the output then it actually means the corresponding post-update node. So there's no overlap between these two lines, and thus should work just fine (with the obvious exception that the update isn't reflected back to the underlying map, which is a lot harder to do, since that's moving into aliasing territory).

"java.util;Map;true;get;;;MapValue of Argument[-1];ReturnValue;value",
"java.util;Map;true;getOrDefault;;;MapValue of Argument[-1];ReturnValue;value",
"java.util;Map;true;getOrDefault;;;Argument[1];ReturnValue;value",
"java.util;Map;true;put;;;MapValue of Argument[-1];ReturnValue;value",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here I suppose -- the old value is returned, but we're presumably going to wire these steps together and alias the new one

"java.util;Map;true;entrySet;;;MapKey of Argument[-1];MapKey of Element of ReturnValue;value",
"java.util;Map;true;get;;;MapValue of Argument[-1];ReturnValue;value",
"java.util;Map;true;getOrDefault;;;MapValue of Argument[-1];ReturnValue;value",
"java.util;Map;true;getOrDefault;;;Argument[1];ReturnValue;value",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing keySet?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I'll add it.

"java.util;List;true;set;;;Argument[1];Element of Argument[-1];value",
"java.util;List;true;subList;;;Element of Argument[-1];Element of ReturnValue;value",
"java.util;List;true;add;(int,Object);;Argument[1];Element of Argument[-1];value",
"java.util;List;true;set;(int,Object);;Argument[1];Element of Argument[-1];value",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant with line 142

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, will remove

"java.util;Queue;true;poll;();;Element of Argument[-1];ReturnValue;value",
"java.util;Queue;true;remove;();;Element of Argument[-1];ReturnValue;value",
"java.util;Queue;true;offer;(Object);;Argument[0];Element of Argument[-1];value",
"java.util;Deque;true;descendingIterator;();;Element of Argument[-1];ReturnValue;value",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other iterators seem to use Element of ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's a typo...

Copy link
Copy Markdown
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the rest.

I guess now is the time to consider whether all these parallel routings of MapKeys and MapValues is enough to make us wish maps were composed of Elements which themselves have Keys and Values.

Comment thread java/ql/src/semmle/code/java/dataflow/internal/ContainerFlow.qll Outdated
Comment thread java/ql/src/semmle/code/java/dataflow/internal/ContainerFlow.qll Outdated
Comment thread java/ql/src/semmle/code/java/dataflow/internal/ContainerFlow.qll Outdated
"java.util;Arrays;false;fill;(char[],char);;Argument[1];ArrayElement of Argument[0];value",
"java.util;Arrays;false;fill;(short[],int,int,short);;Argument[3];ArrayElement of Argument[0];value",
"java.util;Arrays;false;fill;(short[],short);;Argument[1];ArrayElement of Argument[0];value",
"java.util;Arrays;false;fill;(int[],int,int,int);;Argument[3];ArrayElement of Argument[0];value",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I note lots of primitive propagation steps that are ruled out elsewhere?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are they ruled out?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought these would be ruled out by the numeric-type exclusions discussed in the other thread, but perhaps not?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The bulk of this work is about adding value-flow through arrays and collections by modelling the steps as proper read- and store steps (aka as field flow). This means that the PR removes those steps from the default set of taint steps, and instead adds them as general read- and store steps that use the synthetic field ArrayElement. The array read step with the type exclusions in TaintTrackingUtil.qll is a taint step and not a read step.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@aschackmull
Copy link
Copy Markdown
Contributor Author

@aschackmull
Copy link
Copy Markdown
Contributor Author

Jdk11 has still not finished its query evaluations, so performance is likely not looking very good.

@aschackmull
Copy link
Copy Markdown
Contributor Author

Jdk11 has still not finished its query evaluations, so performance is likely not looking very good

Looks like I accidentally wrote a big cartesian product. Oops.

@aschackmull
Copy link
Copy Markdown
Contributor Author

aschackmull commented May 5, 2021

@aschackmull aschackmull force-pushed the java/collection-flow branch from ac5a157 to febc063 Compare May 5, 2021 13:07
@aschackmull
Copy link
Copy Markdown
Contributor Author

aschackmull commented May 6, 2021

The full combination of precise collection read/store steps and backwards compatible taint steps appears to be too costly. Let's see if a suitable middle ground works.
https://jenkins.internal.semmle.com/job/Changes/job/Java-Differences/1380/ (skipping guava as that doesn't seem to build)
https://jenkins.internal.semmle.com/job/Changes/job/Java-Differences/1381/
https://jenkins.internal.semmle.com/job/Changes/job/Java-Differences/1383/

@aschackmull aschackmull force-pushed the java/collection-flow branch from 9312507 to d828e24 Compare May 6, 2021 11:28
@aschackmull
Copy link
Copy Markdown
Contributor Author

@aschackmull
Copy link
Copy Markdown
Contributor Author

Test tentative performance tweak: https://jenkins.internal.semmle.com/job/Changes/job/Java-Differences/1403/

@aschackmull aschackmull force-pushed the java/collection-flow branch 2 times, most recently from bb377e3 to cd87559 Compare May 19, 2021 11:30
@aschackmull
Copy link
Copy Markdown
Contributor Author

I've btw. checked the fixed results on geode. They are of the following form:

String taintedString = taint;
Map<K, Object> map;
map.put(k, taintedString);
Object obj = map.get(..);
byte[] bytes = (byte[])obj;
sink(bytes);

I've left out the irrelevant parts (some paths are > 100 steps, and the above steps are separated by many calls). With collection flow as taint we get this sort of FP flow, but with precise collection flow we are able to prune the path based on type information, as we are able to retain the type String inside the map, and when it's read back out we can see that it's incompatible with the cast to byte[]. 🎉

@yo-h
Copy link
Copy Markdown
Contributor

yo-h commented May 19, 2021

With collection flow as taint we get this sort of FP flow, but with precise collection flow we are able to prune the path based on type information

Nice!

@aschackmull aschackmull force-pushed the java/collection-flow branch from cd87559 to e21063a Compare May 20, 2021 10:56
private import semmle.code.java.dataflow.TaintTracking2

private class StoreTaintConfig extends TaintTracking::Configuration {
StoreTaintConfig() { this instanceof TaintTracking::Configuration or none() }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat unusual, could you explain what's going on here? How does this not cause collisions with other implementations of TaintTracking::Configuration actually used in the query?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed unusual 😉 And yes, it collides with every other configuration. It's a hack to globally modify all other configurations with code that is itself configuration-dependent.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use this vs implementing AdditionalTaintStep? Isn't the API for isAdditionalTaintStep exactly the same and doesn't creating implementations of AdditionalTaintStep have exactly the same effect?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The Configuration::isAdditionalTaintStep predicate is configuration-specific, whereas AdditionalTaintStep is configuration-independent (i.e. global and applies to all configs).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this collection data flow only be applied to TaintTracking and TaintTracking2? Why not all of them?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are only two copies TaintTracking at the moment, so this should be all of them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, this is a temporary measure, since it is quite costly in terms of performance.

@aschackmull aschackmull force-pushed the java/collection-flow branch from c5d8278 to fc913e7 Compare June 1, 2021 09:48
@aschackmull aschackmull force-pushed the java/collection-flow branch from cb1f31a to 1c081ee Compare June 1, 2021 12:00
@aschackmull aschackmull marked this pull request as ready for review June 1, 2021 12:34
@aschackmull aschackmull requested a review from a team as a code owner June 1, 2021 12:34
containerUpdateStep(n1, n2)
}

predicate arrayStoreStep(Node node1, Node node2) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make private or document

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why wasn't this caught by the PR check?

)
}

predicate arrayReadStep(Node node1, Node node2, Type elemType) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make private or document

)
}

predicate collectionReadStep(Node node1, Node node2) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make private or document


import Cached

private module StoreTaintSteps {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document this for the benefit of external readers who ought to know this is a stop-gap / temporary measure (and what it intends to do).

@aschackmull aschackmull merged commit bd9e3d0 into github:main Jun 3, 2021
@aschackmull aschackmull deleted the java/collection-flow branch June 3, 2021 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants